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Abstract 

The main task in analyzing a switching network design (including circuit-, multirate-, and photonic-switching) is 
to determine the minimum number of some switching components so that the design is non-blocking in some sense 
(e.g., strict- or wide-sense). We show that, in many cases, this task can be accomplished with a simple two-step 
strategy: (1) formulate a linear program whose optimum value is a bound for the minimum number we are seeking, 
and (2) specify a solution to the dual program, whose objective value by weak duality immediately yields a sufficient 
condition for the design to be non-blocking. 

We illustrate this technique through a variety of examples, ranging from circuit to multirate to photonic switching, 
from unicast to /-cast and multicast, and from strict- to wide-sense non-blocking. The switching architectures in the 
examples are of Clos-type and Banyan-type, which are the two most popular architectural choices for designing 
non-blocking switching networks. 

To prove the result in the multirate Clos network case, we formulate a new problem called DYNAMIC WEIGHTED 
EDGE COLORING which generalizes the DYNAMIC BIN PACKING problem. We then design an algorithm with com- 
petitive ratio 5.6355 for the problem. The algorithm is analyzed using the linear programming technique. A new 
upper-bound for multirate wide-sense non-blocking Clos networks follow, improving upon a decade-old bound on 
the same problem. 

Keywords: Nonblocking, multirate, switching, linear programming, duality, dynamic weighted edge coloring. 



1 Introduction 

The two most important architectures for designing non-blocking switching networks are Clos-type f5l and Banyan- 
type 1 12|. The Clos network not only played a central role in classical circuit-switching theory |,1„15J, but also was the 
bedrock of multirate switching 1 4|11|19|22|25|32| (e.g., in time-divisioned switching environments where connections 



are of varying bandwidth requirements), and photonic-switching [13 ,24 27 28]. The Banyan network is isomorphic to 
various other "bit-permutation" networks such as Omega, baseline, etc., |2J; they are called Banyan-type networks and 
have been used extensively in designing electronic and optical switches, as well as parallel processor architectures |[9). 
In particular, the multilog design which involves the vertical stacking of a number of inverse Banyan planes has been 
used in circuit- and photonic-switching environments because they have small depth (log N), self-routing capability, 
and absolute signal loss uniformity ||T7]|T8l|20l|29][34|. 

In analyzing Clos networks, the most basic task is to determine the minimum number of middle-stage crossbars 
so that the network satisfies a given nonblocking condition. This holds true in space-, multirate-, and photonic- 
switching, in unicast, /-cast and multicast, and broadcast traffic patterns, and in all nonblocking types (strict-sense, 
wide-sense, and reaiTangeable). Similarly, analyzing multilog networks often involves determining the minimum 
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number of Banyan planes so that the network satisfies some requirements. This paper shows that a simple and effective 
linear programming (LP) based two-step strategy can be employed in the analysis: 



• First, the minimum value we are seeking (e.g., the number of middle-stage crossbars in a Clos network or the 
minimum number of Banyan planes in a multilog network) is upper-bounded by the optimum value of a linear 
program (LP) of the form maxjc-'^x | Ax < b, x > 0}. The maximization objective is often required by worst- 
case analysis, such as the maximum number of middle-stage crossbars in a Clos network which is insufficient to 
carry a new request. The constraints of the LP are used to express the fact that no input or output can generate 
or receive connection requests totaling more than its capacity. 

• Second, by specifying any feasible solution, say y*, to the dual program minjb^y | A^y > c}, and applying 
weak duality we can use the dual-objective value b^y* as an upper bound for the minimum value being sought. 

In some cases, we may not need the second step because the primal LP is small with only a few variables. In 
most cases, however, the LP and its dual are very general, dependent on various parameters of the switch design. In 
such cases, it would be difficult to come up with a primal-optimal solution. Fortunately, we can supply a dual-feasible 
solution to quickly "certify" the bound. 



The LP-duality technique was first used in our recent paper 1 26 1 to analyze the (unicast) strictly nonblocking 
multilog architecture in the photonic-switching case, subject to general crosstalk constraints. This paper demonstrates 
that the technique can be applied to a wider range of switching network analysis problems. Our main contributions 
ai-e as follows. First, we formulate a new problem called DYNAMIC WEIGHTED EDGE COLORING (DWEC) of graphs, 
which generalizes the classic DYNAMIC BIN PACKING problem |6| and the routing problem for multirate widesense 
nonblocking Clos networks. Using the LP-technique, we design an algorithm with competitive ratio 5.6355. A new 
upper bound for the multirate Clos network problem follows. Since BIN PACKING and its variations have been very 
useful in both theory and practice, we believe that DWEC and our results on it are of independent interest. Second, we 
use the LP-technique to prove general sufficient conditions for the multilog network to be /-cast nonblocking under 
the so-called window algorithm, both under the link-blocking model and the crosstalk-free model. To the best of our 
knowledge, these are the first /-cast results for the multilog design. We show that many known results are immediate 
corollaries of these general conditions. 

The rest of this paper is organized as follows. Section|2]presents notations and terminologies. Section|3]illustrates 
the strength of the LP-duality technique on analyzing several problems on the Clos networks. The DWEC problem is 
also defined and analyzed. Section |4]proves non-blocking results for /-cast multilog architecture. Section|5]does the 
same with the crosstalk constraint. 



2 Preliminaries 

Throughout this paper, for any positive integers k, d, let [k] denote the set {1, ... , k}, 1^ denote the set {0, . . . , d — 1} 
which can be thought of as d-ary "symbols," let %\ denote the set of all d-ary strings of length fc, |s| the length of any 
d-ary string s (e.g., |3142| = 4), and s^. ^ the substring Si • • • Sj of a string s = si . . . s; e Zj; (if i > j then s^. ^ is the 
empty string). 

2.1 Switching environments 

Consider an x switching network, i.e. a switching network with N inputs and iV outputs. There are three levels 
of nonblockingness of a switching network. A network is rearrangeably nonblocking (RNB) if it can realize any one- 
to-one mapping between inputs and outputs simultaneously; it is widesense nonblocking (WSNB) if a new request 
from a free input to a free output can be realized without disturbing existing connections, as long as all requests are 
routed according to some algorithm; finally, it is strictly nonblocking (SNB) if a new request from a free intput to a free 
output can always be routed no matter how existing connections were arranged. In the multicast case, RNB, WSNB, 
and SNB are defined similarly. The reader is referred to 1 15| for more details on non-blocking concepts. 

In circuit switching, a request is a pair (a, b) where a is an unused input and b is an unused output. A route 
i?(a, b) realizes the request if it does not share any internal link with existing routes. In an /-cast switching network. 
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each multicast request is of the form (a, B) where a is some input and i? is a subset of at most / outputs. The number 
/ is called the fanout restriction. An N x n multicast network without fanout restriction is equivalent to an A^-cast 
network. 

In the multirate case, each link has a capacity (e.g., bandwidth). All inputs and outputs have the same capacity 
normalized to 1. An input cannot request more than its capacity. Neither can outputs. A request is of the form (a, b, w) 
where a is an input, b is an output, and < 1 is the requested rate. If existing requests have used up to x and y units 
of a's and b's capacity, respectively, then the new requested rate w can only be at most min{l — x, 1 — y}. An internal 
link cannot carry requests with total rate more than 1. 

2.2 The 3-stage Clos networks 

The Clos network C{ni, ri, to, rt2, ^^2) is a 3-stage interconnection network, where the first stage consists of ri cross- 
bars of size rii x to, the last stage has r2 crossbars of dimension to x n2, and the middle stage has m crossbars 
of dimension ri x r2 (see Figure [T]i. Each input crossbar li (i = 1, . . . ,ri) is connected to each middle crossbar 




Figure 1: The 3-stage Clos network C{ni,ri,m, ^2, r2) 

Mj (j = 1, . . . ,to). Similarly, the middle stage and the last stage are fully connected. When ni = n2 = n and 
ri = r2 = r, the network is called the symmetric 3-stage Clos network, denoted by C{n, to, r). 

2.3 The d-ary multilog networks 

Let = d". We consider the log^(Af, 0, to) network, which denotes the stacking of to copies of the d-ary inverse 
Banyan network BY^^{n) with N inputs and N outputs. (See Fig. [2]and[4]) Label the inputs and outputs of BY^^{n) 
and the d x d switching elements (SB) of each stage of BY^^(ri) as illustrated in Fig. [2] We label the inputs and 
outputs of a BY^^(n)-plane with d-ary strings of length n. Specifically, each input u e ZJJ and output v G 
have the form u = mi • • • u„, v = vi ■ ■ ■ Vn, where Ui,Vi e Z^, Vi G [n]. Also, label the d x d switching elements 
in each of the n stages of a BY^^(7i)-plane with d-ary strings of length n — 1. An input x (respectively, output y) 
is connected to the switching element labeled xi. „_i in the first stage (respectively, yi..„-i in the last stage). A 
switching elements labeled z — zi ■ ■ ■ Zn^i in stage i < n — 1 is connected to d switching elements in stage i + 1 
numbered zi ■ ■ ■ * z^+i • • • z„_i, where * is any symbol in Z^. 

For the sake of clarity, let us first consider a small example. Consider the unicast request (x, y) — (01001, 10101) 
when d = 2,n = 5. The input x = 01001 is connected to the switching element labeled 0100 in the first stage, which 
is connected to two switching elements labeled 0100 and 1100 in the second stage, and so on. The unique path from 
X to y in the BY^^(n)-plane can be explicitly written out (see Figurepjl: 



input X 


01001 


stage- 1 switching 


element 


0100 


stage-2 switching 


element 


1100 


stage-3 switching 


element 


1010 


stage-4 switching 


element 


1010 


stage- 5 switching 


element 


1010 


output y 


10101 
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Figure 2: The inverse Banyan network BY ^ (3) 




Figure 3: The inverse Banyan network BY ^ (5) 



We can see clearly the pattern: the prefixes of yi..„_i are "taking over" the prefixes of Xi „_i on the path from x to 
y. In general, the unique path i?(x, y) in a BY~^(n)-plane from an arbitrary input x to an arbitrary output y is exactly 
the following: 



input X 


X1X2 ■ 




stage- 1 switching 


element 


X1X2 ■ 


• ■ ^n— 1 


stage-2 switching 


element 


yiX2 ■ 




stage-3 switching 


element 


ym ■ 


• — 1 


stage-n switching 


element 


yiy2 ■ 


■Vn-l 


output y 


yi2/2 • 


■yn-lVn 



Now, consider two unicast requests (a, b) and (x, y). From the observation above, on the same BY^^(n)-plane 
the two routes i?(a, b) and i?(x, y) share a switching element (also called a node) if and only if there is some j e \n] 
such that = yi..j-i and aj..„_i — Xj,,n-i- In this case, the two paths intersect at a stage-j' switching element. 

It should be noted that two requests' paths may intersect at more than one switching element. 

For any two d-ary strings u, v G Z^, let PRE(u, v) denote the longest common prefix, and SUF(u, v) denote the 
longest common suffix of u and v, respectively. For example, if u = 0100110 and v = 0101010, then PRE(u, v) = 
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Figure 4: A multi-log network with 3 inverse Banyan planes 

010 and SUF(u, v) — 10. The following propositions straightforwardly follow (for more details, see e.g. p5)). 

Proposition 2.1. Let (a, b) and (u, v) be two unicast requests. Then their corresponding routes R{sl, b) and i?(u, v) 
in a BY^^{n)-plane share at least a common SE if and only if 

|suF(ai..„_i,Ui..„_i)| + |PRE(bi..„_i,vi..„_i)| >n-l. (1) 
Moreover, the routes i?(a, b) and R(u, v) intersect at exactly one SE if and only if 

|sUF(ai..„_i,Ui..„_i)| + |PRE(bi..„_i,vi..„_i)| = n-l, (2) 
in which case the common SE is an SE at stage |PRE(bi..„_i, vi..„_i)| + 1 of the BY^^{n)-plane. 

Proposition 2.2. Let (a, b) and (u, v) be two unicast requests. Then their corresponding routes i?(a, b) and i?(u, v) 
in a BY^^{n)-plane share at least a common link iff 

|suF(ai..„_i,ui..„_i)| + |PRE(bi. .ri-i, vi..„_i)| > n. (3) 

3 Results on the Clos Networks 

3.1 Two classic examples in circuit switching 

To illustrate the LP-duality technique, we begin with two simple examples which have become classic textbook mate- 
rials. 

Example 3.1 (The SNB Case). Consider the symmetric Clos network C{n, m, r). Consider a new request from an 
input of input crossbar / to an output of output crossbar O. A middle crossbar cannot carry this request if it already 
carried some request from / or some request to O. Let x (resp. y) be the number of middle crossbars which already 
carry some requests from / (resp. to O). Since the number of existing requests from / or to O is at most ti — 1, we 
have X < n — 1 and y < n ~ 1. The number of unavailable middle crossbars is thus bounded above by the optimal 
value of the LP 

max{x + y \ X < n — l,y < n — 1, x^y > 0}. 

The dual program is 

min{(n - l)(a + /3) | a > 1, /3 > 1, a, /3 > 0}. 

Setting a = /3 = 1 is certainly dual-feasible, and thus its objective value 2n — 2 is an upper bound on the number of 
unavailable middle crossbars. We conclude that m > 2n — 1 is sufficient for C{n, m, r) to be SNB. 
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Example 3.2 (The WSNB Case). This example is a classic result by Benes |[T]. Consider the C(n, m, 2) network. 
The routing algorithm is simply the following rule: reuse a busy middle crossbar whenever possible. 

For any i,i G {1, 2}, let Mij be the set of middle crossbars carrying an li, Oj-request. The sets certainly 
change over time as requests come and go. However, it is easy to show by induction that the routing rule ensures 
\Mii U M22I < n and \AI12 U M21I < n at all times. To see this, without loss of generality consider a new Ii, Oi- 
request. If we can find a crossbar in A/22 to route the new request, then the union A/n U M22 does not change and 
thus \Mii U M22I < n by induction hypothesis. If every crossbar in Af22 is not available for the new request, then it 
must be the case that M22 C Mn. There are at most n — 1 existing requests out of Ii. Thus, jMn | < n — 1. Hence, 
before routing the new /i, Oi -request we have jAfu UM22I = |A/ii| < n — 1. Consequently, after realizing the new 
request, we have |Afii U A/22 1 < n-. 

Next, again without loss of generality, consider a new request from /i to Oi. If Af22 \ Afn ^ 0, then we 
have a busy crossbar to reuse. Otherwise, the number of unavailable middle-crossbars for this new request is precisely 
|A/ii U A/12U Af2i| = |A/ii| + |A/i2U Af2i|. Just before the arrival of this new request, the number of existing requests 
/rom /i or fo Oi is at most i.e. |AfiiUA/"i2| = |A/ii| + |Afi2| < and |AfiiUAf2i| = |A/ii| + |A/2i| < n~l. 
The number of unavailable middle crossbars is thus bounded by the optimal value of the following LP, where we think 
of set cardinalities as variables: 

max |Afii| + |Afi2 U Af2i| 

s.t. lATiil + |Afi2| <n-l 

|Mn| + |Af2i| <n-l 
IM12I + IA//21I <n 
\M12UM21\-\M12\-\M21\ <0 

The last inequality is the straightforward union bound. Obviously, all cardinalities are non-negative. The dual LP is 

min {n - l){yi + y2) + ny3 
s.t. yi+y-i > 1 

2/2+2/3-^4 > 

2/1+2/3-2/4 > 

2/4 > 1, 2/1,2/2,2/3 > 

Setting 2/1=2/2 = 2/3 = 1/2 and 1/4 = 1 is certainly dual-feasible with objective value 3n/2 — 1. Hence, by weak 
duality the number of unavailable middle-crossbars for the new /i, Oi-request is at most [3n/2j — 1, which means 
m > [3n/2j is sufficient for C(n, m, 2) to be WSNB. It is not hard to show that m > [3n/2j is also necessary f\\. 
This (r — 2) is the only case for which a necessary and sufficient condition is known for the Clos network C(n, to, r) 
to be WSNB! 

3.2 Multirate switching and the dwec problem 

It is known that C(n, to, r) is multirate WSNB when to > 5.75n fTTl. This section uses the LP technique to improve 
this bound via solving a much more general problem called DYNAMIC WEIGHTED EDGE COLORING (DWEC). 

Definition 3.3 (The dwec problem). Let G — {V, E) be a fixed simple graph called the base graph. Let Gq = {V, 0) 
be an empty graph with the same vertex set. At time t, either an arbitrary edge e is removed from Gt-i, in which case 
Gt = Gf_i — {e}, or a copy of some edge e E E "arrives" along with a weight We € (0, 1], in which case define 
Gt = Gt-i U {e}. Note that Gt can be a multi-graph as many copies of the same edge may arrive over time. The 
arriving edge is to be colored so that, in Gt, the total weight of same-color edges incident to the same vertex is at most 
1. 

The objective is to design a coloring algorithm so that the number of colors used is minimized, compared to an 
off-line algorithm which colors edges of Gt subject to the same constraint. Formally, let OPT{t) denote the number of 
colors used by an optimal off-line algorithm on Gt. Let OPT(i) = maxi<f OPT(i). For any online coloring algorithm 
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A, let A{t) be the number of colors ever used by A up to time t. Algorithm A has competitive ratio p if, for any 
sequence of edge arrivals/departures with arbitrary weights, we always have A{t) < p ■ OPT(t), Vi. 

The DYNAMIC BIN PACKING problem is exactly the DWEC problem when the base graph G ~ K2, where each 
color is a bin. The best competitive ratio for DYNAMIC BINPACKING is known to be between 2.5 and 2.788 ^6J. We 
will show that the DWEC's best competitive ratio is somewhere between 4 and 5.6355 for any base graph G. 

Theorem 3.4. There is an algorithm for DWEC with competitive ratio 5.6355. 

Proof. For the sake of presentation clarity, we will prove a slightly weaker ratio of 5.675, and then indicate how to 
obtain the better ratio 5.6355. The two proofs are identical, but the one we present is cleaner. 

At any time t, let W^{t) denote the total weight of edges incident to u in Gf, and let denote the number of 
edges of weight > 1/2 incident to u. Let W{t) ~ maxi<t max„ W^{i) and A(<) = maxi<t max„ It is not 

hard to see that \W{ty\ < OPj{t) and A{t) < OPT(t). 

Refer to an edge a type-0, type-1, type-2, or type-3, if its weight belongs to the interval (5, 1], (§,5]^ (j' fl' 
or (0, |], respectively. Our coloring algorithm is as follows. Maintain 4 disjoint sets of colors Gi{t), < i < 3. 
Let xo,xi,X2,X3 be constants to be determined. For each i = 0..3, we will maintain the following time-invariant 
conditions: \Gi{t)\ = \xiW{t)~\ for 1 < i < 3 and \Go{t)\ = \xoA{t)~\. 

If W{t) or A{t) is increased at some time t then we are allowed to add new colors to the sets Gi{t) to maintain 
the invariants. Note that W{t) and A{t) are non-decreasing in t; hence, colors will never be removed from the Gi{t). 
The colors in Go{t) are used exclusively for edges of type-0. The coloring for edges of types i, 1 < i < 3 is done as 
follows. If a type-i edge arrives at time t, find a color in Ci{t) to color it. If Gi{t) cannot accommodate this edge, try 
Ci+i{t), and so on until G^lt). We next show that if the constants Xi are feasible solutions to a certain LP, then it is 
always possible to color an arriving edge. 

Suppose a type-0 edge e — {u,v) arrives at time t. If we cannot find a color in Go{t) for e, then |Co(t)| < 
- 1) + - 1) = id"{t) - 1) + (d^it) - 1) < 2A{t). Hence, as long as a;o > 2 we can color e. 

Next, suppose e — {u,v) of type 1 arrives at time t and we cannot find a color in Ci (t) U C2 (t) U C3 (t) to color e. 
For a color c e Ci {t) to be unavailable for e, there must be at least two type-1 color-c edges incident to either u or v. 
Thus, the total type-1 weight at u and u is > ||Ci(i)|. Similarly, for each color c in C2{t), the total c-weight incident 
to u and v must be > 1/2, which means this color c "carries" either at least two type-1 edges, or one type-1 edge and 
one type 2 edge, or at least two type-2 edges. Thus, the total color-c weight incident to u and v must be > ||C2(i)|. 
Lastly, for each color c in G3{t), the total color-c weight incident to u and v must be > l/2|C3(t)|. Note that the total 
weight at u and u is < 2W(t). Consequently, we will be able to find a color for e if 

l\C,{t)\ + ^\C2it)\ + l\C,{t)\>2Wit), 

which would hold if |xi + |x2 + ^x^ > 2. Similarly, a newly arriving type-2 edge is colorable if ^X2 + |a;3 > 2, 
and a new type-3 edge is colorable if ^x^ > 2. Consequently, our coloring algorithm works if the Xi are feasible for 
the following LP: 

min Xf) +xi +X2 +X3 
s.t. xq > 2 

+1x2 +^X3 > 2 

1X2 +^X3 > 2 

fxa > 2 
a;o,a::i,X2,X3 > 0. 

The solution xq — 2,xi — 3/8, X2 — 3/10, 2:3 = 3 is certainly feasible. The total number of colors used is 

Y g 

\xoA{t)'] + ^\xiW{t)~\ < {xn + X1+X2+ X3)0PT{t) + - + — < 5.675oPT(t) + 1.8. 

2—1 

As is customary in online/dynamic algorithm analysis, we ignore the constant term of 1.8, as we let OPT(i) — > 00. To 
prove the better ratio 5.6355, divide the rates into 5 types belonging to the intervals (1/2, 1], (2/5, 1/2], (1/3, 2/5], 
(11/43, 1/3], and (0,11/43]. □ 
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Corollary 3.5. The Clos network C{n, to, r) is multimte WSNB if m > 5.6355n + 4. 

Proof. Consider the multirate WSNB problem on the Clos network C{n,ni,r). We formulate a DWEC instance 
generalizing the problem. The base graph is the complete bipartite graph G — I x O, where I is the set of input 
crossbars and O is the set of output crossbars. When a new request (a, b, w) arrives at time t, add an edge e = (/, O) 
to Gt-i where / is the input crossbar to which a belongs and O is the output crossbar to which b belongs. Set the 
edge weight = w. Think of each middle-crossbar as a color Obviously, the maximum number of colors ever used 
by an algorithm A is also a sufficient number of middle crossbars needed for C{n, m, r) to be non-blocking. 

In the above algorithm, A{t) < n because the number of requests with rate > 1/2 coming out of the same input 
crossbar or into the same output crossbar is at most n (one per input/output). Moreover, W{t) < n because the total 
rate of requests from/to an input/output is at most n. Hence, the number of middle-stage crossbars (i.e. colors) needed 
is at most 5.6355n + 4. □ 

Remark 3.6. Our strategy can also give a better sufficient condition than the best known in pT| for the case when 
there's internal speedup in the Clos network. However, for the ease of exposition, we refrain from stating the most 
general result we can prove. 



4 Analyzing /-cast wide-sense nonblocking multilog networks 

Let /, t be given integers with < t < n, and I < f < N — d" . This section analyzes /-cast wide-sense nonblocking 
log^(A^, 0, m) networks under the window algorithm with window size d*. The algorithm was proposed and analyzed 
for one window size (iL"/2j jjj and later analyzed more carefully for varying window sizes in jTj. Both papers 
considered the multicast case with no fanout restriction. We will derive a more general theorem for the /-cast case. 

• The Window Algorithm with window size d*: Given any integer t, < t < n, divide the outputs into 
"windows" of size d* each. Each window consists of all outputs sharing a prefix of length n — t, for a total of 
d"^* windows. Denote the windows by W^, < w < d"^* — 1. Given a new multicast request (a, B), where 
a is an input and i? is a subset of outputs, the routing rule is, for every < it; < d"~* — 1, the subrequest 
(a, B n Wuj) is routed entirely on one single BY^^ (ri)-plane. (Different sub-requests can be routed through the 
same or different BY^^(7i)-planes.) 



Remark 4.1. there is a subtle point about the window algorithm due to which the original authors in 1 3 1 1 thought their 
multilog network was strictly nonblocking instead of wide-sense nonblocking. Basically, for some specific values of 
the parameters the algorithm is no algorithm at all. In those cases, any sufficient condition for the network to be 
nonblocking under the window algorithm is in fact a strictly nonblocking condition, not a wide-sense nonblocking 
condition. 

For example, in the unicast case we have / = 1, which means the window algorithm does not specify any routing 
strategy; consequently, any nonblocking condition is actually a strictly non-blocking condition. Another example is 
when t = 0. In this case, the routing rule says that each branch of a (multicast) request should be routed on some 
plane, independent of other branches. Because there is no restriction on how to route the branches, any nonblocking 
condition is a strictly non-blocking one. 

Yet another example is when t = n. Here, the routing rule is for each request to be routed entirely on some plane. 
If the 1 X m-SE stage of the multilog network has fanout capability, then the rule does restrict how we route requests, 
and thus we indeed have a wide-sense nonblocking situation. However, if the 1 x to-SE stage is implemented with 
1 X TO-unicast crossbars or 1 x TO-demultiplexers, then we have to route each request entirely on some plane. Thus, 
any sufficient condition is a strictly nonblocking condition. 



4.1 Setting up the linear program and its dual 

Let (a, B) be an arbitrary /-cast request to be routed using the window algorithm with window size d*. Following the 
window algorithm, due to symmetry without loss of generality we can assume that B = {b(i), . . . , bC^)} where all 
the outputs b(') (I e [k]) belong to the same window Wq, and k < min{/, d*}. The b''^ thus share a common prefix 
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of length n — t. (This is because subrequests to the same window are routed through the same plane and different 
subrequests of the same request are routed independently from each other and they do not block one another.) 

For each i e {0, . . . , n — 1}, let Ai be the set of inputs u other than a, where ui..„_i shares a suffix of length 
exactly i with ai..„_i. Formally, define 

Ai := {u e Z3 - {a} | SUf(ui..„_i, ai..„_i) = i] . 

For each j e {0, . . . , n — 1}, let Bj be the set of outputs other than those in B which share a prefix of length exactly 
j with some member of B, namely 

B^ := {v e - B I 3/ G [fc],PRE(vi..„_i,bll_i) = j} . 

Note that 



,0 < z < n- 1, 



\B, 



= d''-' - d"-^-^ , < 7 < n - t - 1. 



Define A = UiLo^ ^^^y ^ 



j=0 w=l 

n-1 

U Bj = Wo-B. 



j=n-t 

Furthermore, for each j < n — t — 1, Bj is the disjoint union of precisely — windows each of size 

Note that the sets Bj for < j < n—t— 1 are mutually disjoint. On the other hand, the sets Bjforn—t<j<n—l 

are not necessarily disjoint, because for the same output V e Wq — -Bit might be the case that PRE(vi..„_i,b^'^„_i) = 

j and PRE(vi..„_i, b^^ = / for j ^ /, I ^ I'. The following simple observation turns out to be an important 

analytical detail in many of the proofs. 

Proposition 4.2. Let q be an integer such that n — t<q<n — 1. Then, 



j=Q 



<min{(i*-A;,A:(d"-«-l)}, 



and 



u 



j=n-t 



d' -k. 



Proof. To see the inequaUty, note that 



counts the number of strings v in Wq — B for which 



PRE(vi..„_i,bf^„_i) > q 

for some b('\/ G [k]. As iM^ol = d\ the upper-bound d^ — k for the number of such strings is trivial. On the other 
hand, the number of strings V where PRE(vi..„_i,b^'^„_^) > g for a fixed string b^'^ is at most — 1 (discounting 
b^'^ itself). Hence, we get the upper-bound — 1) via a simple application of the union boimd. The equality 

trivially holds because U?=n-t — — B. □ 
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For every input u G let i(u) denote the index i such that u G For every w G [cP * — 1] = {1, . . . ,d" 
let j{w) be the index j such that C Bj. For every v G Wq — B, let j (v) denote the largest j for which v E Bj. 
Note that jiv) > n — t for such output v because Wq — B — IJj^n-t ^j- 

Lemma 4.3. For each input u G .4 and each w G — 1] such that i{u) + j{w) > n, define a variable x-a,w 

Also, for each input u G .4 and each output v G Wq — B such that i{u) + j(v) > n, define a variable Xu,v Then, 
the number of Banyan planes blocking the new multicast request (a, B) is upperbounded by the optimal value of the 
following linear program: 



s.t. Eu^^u,^ < we[d"-'~l] 

Ev^u.v < 1 VUG^ 

Eu^^u.v < 1 VvgVKo-S 

J2w + Ev 2;u,v < / Vu G ^ 

a;u,u.,a;u,v > Vu, w,v 

Obviously, the sums and the constraints only range over values for which the variables are defined. . 

Proof. Suppose the network logj^{N, 0, m) already had some routes established. Consider a BY^^(n)-plane which 
blocks the new request (a, B). There must be one route i?(u, v) on this plane for which i?(u, v) and i?(a, b*^'^) share 
a link, for some I E [k]. Note that the branch R{u, v) could be part of a multicast tree from input u, but we only need 
an arbitrary blocking branch (u, v) of this tree. Note also that u ^ a because subrequests from the same input are 
parts of the same request and thus their routes do not block one another. Let S be the set constructed by arbitrarily 
taking exactly one blocking branch (u, v) per blocking plane. Then, the number of blocking planes is 

Fact 1: if (u, v) and (u, v') are both in S then v and v' must belong to different windows; because, if they belong 
to the same window, the window algorithm would have routed them through the same plane, and S only contains one 
branch per blocking plane. 

Fact 2: each output v can only appear once in S, because each output can only be part of at most one existing 
request. 



Fact 3: if (u, v) G S, then (u, v) G Ai x Bj for some i + j > n, thanks to Proposition 2.2 
Straightforwardly, we will show that S defines a feasible solution to the linear program with objective value 
precisely \S\. Set x^^m = 1 if there is some (u, v) G 5 such that v G and a;u,v = 1 if there is some (u, v) G S" 
such that V G Wq — B. All other variables are set to 0. Due to Fact 3, the procedure does not set value for an undefined 
variable. Certainly \S\ is equal to the objective value of this solution. 

We next verify that the solution satisfies all the constraints. The first constraint expresses the fact that each output 
in a window Ww of size d* only appears at most once in S (Fact 2). The second and third constraints are a restatement 
of Fact 1. Note that the sumin the third constraint is only over v G Wq — B. The fourth constraint says that each 
output V G Wq — B appears at most once in S (Fact 2 again). The fifth constraint says that each input can only be part 
of at most / members of S, due to the /-cast nature of the network. □ 

The dual linear program can be written as follows. 

min ^ d*a„ + ^ l3u,w + 7u + "^v + /^u 

W U.W U V u 

s.t. + /3u,iu + Cu > 1, Xu^tu defined (DC- 1) (5) 

7u + ^v + eu > 1, a:u,v defined (DC-2) 

a™,/?u,«.,7u,'5v,eu > Vu,v, u; 

Note that the dual-constraints only exist over all u, v, w for which x^.w and x^.v are defined, in particular they exist 
for pairs (u, w) such that i(u) + j{w) > n and pairs (u, v) such that i(u) + j'(v) > n. 
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4.2 Specifying a family of dual-feasible solutions 

To illustrate the technique, let us first derive a couple of known results "for free." The first is Theorem III. 2 in p 



Corollary 4.4 (Theorem III.2 in p5|). Let r — [log^; /J. Suppose the 1 x m-SE stage of the \og^{N, 0, m) network 
does not have fanout capability, then when f < d"^^ the network is f-cast strictly non-blocking if 

m > dl^\ + f (^^""^1 - 1 



When / > d" ^ the network is f-cast strictly nonblocking if m > d" 



Proof. Recall Remark 4.1 routing using the window algorithm with window size t = n is the same as routing 



arbitrarily in the network when the 1 x m-SE stage cannot fanout. Thus any sufficient condition for the window 
algorithm to work is a strictly nonblocking condition. Note that when t = n the dual constraints (DC-1) do not exist! 
We construct a feasible solution to the dual linear program as follows. 

When / > d"^^, set 7u = 1 for all u G Ur=i ^^id all other variables to be 0. The dual objective value in this 
case is 



E 7u 



n-1 

E 

1=1 



A, 



n-1 

E 

i=l 



1. 



and hence one more plane (i.e. m > d"^^) is sufficient. Note that this solution is dual feasible, because for u G Aq 
there is no v for which i{u) + j(v) > n. In other words, there is no dual constraint for which u E Aq. 

Next, suppose / < d"^^. Define q = [^^^^J + 1. Note that r + l<q<n — lin this case; in particular, kd^^'' < 
^r+i^«-g < ^j^^^j^ jj^piigs min{d" - fc, /c(d"-« - 1)} = fcd"-«. Set 7u = 1 for all u with i{u) >n-q+l 
and 5v — i for all v e [J'i=q Bj. All other dual variables are 0. The solution is dual feasible because, for any pair 



(u, v) for which i{u) 



V e U?=g Bj). Recalling Proposition 



j(v) > n, w e mu st either have i{u) > n — q 
4.2 the dual objective value is 



E + 

u : i(u)>n-f/+l veUjC,^ Bj 



E 



< 



n-1 

E 

i—7i—q-\-l 



A, 



1 or j(v) > q (which is the same as saying 



< d« 



d«-l - 1 
d«~l - 1 

1 



min{d"-fc,/fc(d"-«-l)} 
kid"-" - 1) 
fid"-" - 1) - 1, 



This is an upper bound on the number of blocking planes. Hence, one more plane is sufficient to route the new 
(arbitrary) request. □ 

Because unicast is 1-cast, by setting ?- = in the previous corollary we obtain the following corollary, whose proof 
was about 5 pages long in p4| . Recall remark 4.1 which ensures that our result is a strictly nonblocking condition 
rather than a wide-sense nonblocking one. 

Corollary 4.5 (Theorem 1 in |14|). For log^(7V, 0, m) to be unicast strictly nonblocking, it is sufficient that m > 

^r„/2i-i + rfL«/2j _i. 

Corollary |4.4| solves the t — n case. We will consider < t < n henceforth. We next specify a family of dual- 
feasible solutions to the dual-LP (|5]l. The main remaining task will be simple calculus as we pick the best dual-feasible 
solution depending on the parameters /, n, d, t of the problem. 

The family of dual-feasible solution is specified with two integral parameters where 0<p<n — t— 1 and 
n — t < q < n. The parameter p is used to set the variables e^, and fiu,w, and the parameter q is used to set the 
variables 7u and S^. As we set the variables, we will also verify the feasibility of the constraints (DC-1) and (DC-2), 
and the contributions of those variables to the final objective value. 
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• Specifying the Cu variables. Set = 1 if i(u) > n — p and otherwise. The contribution of the to the 
objective is 

n— 1 n— 1 

E^"= E / E 1= E f\Ai\ = f{d^-i). 

u i—n—p u:i(u)—i i—n—p 

• Specifying the and (3u,w variables. Next, we define the and (iu^w The constraints (DC-1) with i{u) > 
n — p are already satisfied by the hence we only need to set the a^^ and /3u,id to satisfy (DC-1) when 

> p + 1- (If j{w) < p, then for the constraint to exist we must have i{u) > n — j{w) > n — p.) The 
variables and /3u,tu are set differently based on three cases as follows. 

Case 1. If < > [|J, then set /3u,i« = 1 whenever jj+l < j{iu) < n — t—l and n—j{w) < i{u) < n—p—1, and 
set all other ayj and ^u.tu to be 0. It can be verified straightforwardly that all constraints (DC-1) are satisfied. 
Recall that the number of windows for which j{w) = j is precisely <P~^~*' — Thus, the 

contributions of the and /5u,u; to the dual objective value is 

n—t — l n—p—1 

E /5u,«- = E \{^--3i^)=3}\ E l{ue^:^(u) = z}| 

yi-,w j=p-\-\ i=n—j 

p-\-l<j{w)<n—t—l 
n—j{w)<i{u)<n—p—l 

n—t—l n—p—1 

j=p+l i=n-j 
n-t-1 

j=p+l 

= {n-t- 1 - - + <F. 

Case 2. Whenp + 1 < t < [§J — 1, set /3u,to = 1 whenver p + 1 < j{w) < t and n — j{w) <i<n — p — 1, 
and = 1 for t -|- 1 < j{w) < n — t — 1, and all other and I3u,w to be 0. AH constraints (DC-1) are thus 
satisfied. The a^'s and /9u,«,'s contributions to the objective is 



Y d*a^+ E ^^'^ = E d* ■\{w:j{w)=j}\ + 

U,'W 

p-\-l<j{'w)<L 
n—j{w)<i{u)<n—p—l 



t<j{w)<n-t P+l<j{w)<L 



t n—p—1 

J2 \{w:j{w)=j}\ j2 l{ue.4:i(u)=i}| 

j=p+l i=n-j 
n-t-1 

j=t+l 
t 

j=p+l 

= {t- p){d''-' - d"-*-^) + - d*. 

Case 3. When t <p (which is < n — f — 1), set = 1 for p -|- 1 < j{w) < n — t — 1 and all the ^u,w to be 
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zero. Again, the feasibility of the constraints (DC-1) is easy to verify. The contribution to the objective value is 

= ^ d'-\{w:j(w)^j}\ 



(fay, 

w 

p<j(w)<n—t 



n-t-1 



j=p+i 

Specifying the 7u and variables. Here, there are two cases 

When q = n — set (5v = 1 for all v e Uj=n-t ^^'^ 7u = 0. The dual-objective contribution in this case 



U 



j=n-t 
ri-1 



d*-k. 



When n — t + l<q<n, define Sy, = I for all v e [Jj=q Bj, 7u = 1 for all u such that n — q + 1 < i(u) < 
n — p — 1, and all other 5^ and 7u are set to be zero. From Proposition 4.2 the total contribution of the 7u and 
<5v to the dual-objective is at most 



E 7u+ E 

-g+l<i(u)<n-p-l ^<^U]=g Bj 



n—p— 1 

< \M + ^Hd* - k, k{dl'-'i - 1)} 

i—n—q-\-l 

= d«-i-dP + min{d*-fc,/fc(d"-«-l)}. 

The feasibility of all the constraints (DC-2) is easy to verify. 

Define the "cost" c{k,p, q) to be the total contribution of all variables to the dual-objective value. We summarize 
the values of c{k,p, q) in Figurejs] We just proved the following. 

Theorem 4.6. The above family of solutions is feasible for the dual linear program (|5]l with objective value equal to 
c{k,p, q). Consequently, for the network log^{N, 0, m) to be wide-sense nonblocking under the window algorithm 
with window size d*, it is sufficient that 



m > 1 + max minc(A;,p,q). 

l<fe<min(/,rf*) P,q 



(6) 



4.3 Selecting the best dual-feasible solution 

It is a very straightforward though somewhat analytically tedious task to derive the best possible sufficient condition 
using Theorem 4.6 The idea is, for a given k < min(/, d*), we first choose p = Pk,q = (Zfc so that c{k,pk,qk) 
is as small as possible. Then, derive an upperbound C{t, /) > max^ c{k,pk,qk)- The sufficient condition is then 

TO > C{tJ) + 1. 

We first need a technical lemma. 



Lemma 4.7. Let d, n, k be positive integers, and x — [log^ fcj. Then, the following function 

h{k) ^dV-^\ +k (d"- L^^J -1 - 1^ 

is non-decreasing in k. 



(7) 
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The objective value c{k, p, q) 



Fort > [2iJ andq = n-t, 

c{k,p, q) = f{dP - I) + {n-t-1- p)(dr--^ - - + cP +(f -k. 

Fori > [5J and g > n-t, 

c(k,p,q) = f(dP - 1) + (n- t - 1 - p)(d"-* - - + +min{d* - k,k{d"-'' - 1)}. 

Forp + 1 < t < [|J - 1 and <j = n - t 

c(k,p,q) = f{dP - 1) + {i-p){(i"-* + -k. 

Forp + 1 < t < [f J - 1 andg > n - t 

c(fc,P,g) = f{dP - 1) + (t-p)(d"-* - + - d^ + d"-^ - + mm{d* - k,k{d"-i - 1)}. 

For t < p and q = n — t. 



For t < p and q > n — t. 



c(fc,p,g) = /K-l) + d"-P-l-fc. 



c(fc,p,g) = /(df - 1) + - +<i''"' - rfP + min{d' - k,k{d"-'' - 1)}. 



Figure 5: The dual objective value of the family of dual-feasible solutions. 



The upper-bound C{t, /) 



To shorten the notations, let r = [log^j /J 
1-1 



f{d 



l\+d I 2 I _ 1 



•n — t — 1 _|_ jn — 2t — 1 



C(t,/)= <^ 



t < L?J < " - 2t- 1 
t< LtJ .^>"-2t 

[(n - t - 1) - +d* - (d- l)d2«-"-l t > [^J ,r > n - t 

/ (d"-*-'-i - 1) + [r(d - 1) - l]d"-*-i + d"-*-"--! + d« - (d - l)d2t-"-i t > [f J , and 

2t-n-2<r<n-t-l 

/ (d"-*-"--! - 1) + [r-(d - 1) - l]d"-*-i + di^J + / ^d"" [t^J -1 _ 1 j t > [f J , and 

r < min(2t — n — 2,n — t— 1) 



Figure 6: We show in Theorem 4.8 that C{t, f) > max^ min^ ,j c{k,p, q) 



Proof. We induct on k. The inequality trivially holds when k — 1. Consider k > 2. First, suppose k is not an exact 
power of d, i.e. fc > d^. In this case, we have 



hik-1) = dL^^J +(fc- 1) (^"-L'^J-i - 1) 
< dl^\+k[d-l^\-'-i)=h{k). 



Second, consider the case when fc = d^. It can be verified that, no matter what the parities of n and x are, the multiset 
l^n+l^j ^ is exactly equal to the multiset {[^^J , - l}. Thus, noting that [logrf(fc-l)J = x-1. 
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we have 



h{k-i) = dL^^J +(fc-i)(d"-L=^J-i-i 

= dl"-^\+d\^^-dl^^- 



< 



di^\ 
di^\ 
di^\ 



d 



d\'- 



^1 



= h{k). 



□ 



Theorem 4.8. The log^(iV, 0, m) network is nonblocking under tlie window algorithm with window size d* if m > 
1 + C{t, /) where C{t, /) is defined in Figure^ 

Proof. Consider 5 cases in the definition of C{t, /). We specify for each k how to set the values pk and qk- The 
straightforward task of verifying that c{k,pi^,qk) < C{t,f) is mostly omitted due to space constraint, except for 
situations when it is tricky to verify. 



Case 1: t < [|J , r < n — 2t — 1. For any k, choose pk = ~ l] and qk 
Case 2: t < \_^\ ,r > n — 2t. For any fc, set pk = and qk = n — t. 



t. 



• Case 3: t > [^J ,r > n — t. This case is a little trickier analytically. Define x = [log^ fcj . We set pk and qk 
differently depending on how large x is, so that the inequality c{k,pk, qk) < C{ti f ) always holds. 

If < X < 2t - n - 2, which can only hold when t > then set qk = + 1 and pk = 0. Note that 



qk > n — t and x + 1 + n — qk < t. Thus kd" < d*. Recall from Lemma 4.7 that function h{k) defined in 



(|7]i is non-increasing, and the fact that in this case k < d^^ 


1 _ 


- 1 < d^* " ^ — 1, we have 


c{k,Pk,qk) = 


[{n-t-l){d-l) - 


-1 _ 


f d'l"-^ + min{d* - k, kid''-'"' - 1)} 




l{n-t-l){d-l) - 


-1 


f d""-! + fc(d"~«'' - 1) 




[{n - t - l){d - I) - 


-1 


^h{k) 


< 


[{n - t - l){d - 1) - 


-1 


f /i(d^*-"-i - 1) 




[{n - t - l){d - 1) - l]d"-*- 


-1 _ 


f d*-i + (d2*-"-i - l)(d"-* - 1) 


< 


[(n-<- l)(d- 1) - 


-1 _ 


f d*-i + d2*-"-i(d - l)(d"-* - 1) 




[{n - t ~ l){d - 1) - l]d'-'- 


-1 


f d* - (d- l)d2*-"-i 




c{tj). 







If a; = 2t - n - 1 and A: < d"^+i - d^, then set qk = [^J + 1 = < and pfe = 0. If x = 2t - n - I and 
> d^+^ — d^ + 1, then set set = n — t and = 0. Finally, when x>2t — n, we again set qk = n — t and 
Pfe = 0. 

Case 4: t > [|J ,2t — n — 2 < r < n — t—1. Note that this case can only happen when t < 2n/3. In particular, 
if t > 2n/3 and r<n — < — Iwe would be in case 5. Set pk = n — t — r — 1 and qk = [^^^J + 1- Proving 
c(fc, Pk,qk) < C{t, /) is almost identical to Case 3 where we consider different ranges of x — [log^ fcJ . 

Case 5: t> [f J , r < min(2t -n-2,n-t-l). Set pk ^ n - t - r - I md qk [^J + 1. Showing 
c{k,pk, qk) < C{t, f) is similar to Case 3. The only slight variation is, instead of bounding k < d^+^ — 1 we 
apply k < f directly. The function h{k) is then bounded by h{f). Furthermore, we do not have to consider the 
cases when x > 2t — n — \ because x<r<2t — n~2. 



□ 
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4.4 Some quick consequences of Theorem [48 



All we have to do is to plug in the parameters t and / and compute 1 + C{t, f) to get the following results. 
Corollary 4.9 (Theorem 4 in [lOJ). Let r — [log^ /J . The network log^(Af, 0, m) is f-cast strictly non-blocking if 



Proof. This corresponds to the t = case of the window algorithm, which becomes a strictly nonblocking condition 
as noted earlier. 

□ 

The following result took about 6 pages in ||7J to be proved (in two theorems) with combinatorial reasoning. The 
result is on the general multicast case, without the fanout restriction /. In our setting, we can simply set / = = d". 
In fact, even though the corollary states exactly the same results as in fT\, the statement is simpler. 

Corollary 4.10 (Theorems 1 and 2 in |7 1). The d-ary multi-log network \og^{N, 0, to) is wide-sense nonblocking with 
respect to the window algorithm with window size d* if 

{d""-^^-^ + td''-^-^ {d ~ 1) whent< [fj -1, 

- - i - 1) - 1] 
+d* - (d - 1) + 1 when i > [f J . 



5 Analyzing crosstalk-free /-cast wide-sense nonblocking multilog networks 

When the multi-log architecture is employed to design a photonic switch, each 2x2 switching element (SE) needs 
to be replaced by a functionally equivalent optical component. For instance, when d = 2 we can use so-called 
directional couplers as SEs |21 30 36). However, directional couplers and many other optical switching elements 



suffer from optical crosstalk between interfering channels, which is one of the major obstacles in designing cost- 
effective switches |[3][8][33). To cope with crosstalk, the crosstalk-free constraint is a common requirement, which 
states that no two routes can share a common SE |[3j[8l [T6l[T7l[20l[34l [35l . 



Thanks to Proposition 2.1 to analyze crosstalk-free /-cast wide-sense nonblocking multilog networks under the 
window algorithm, basically all we have to do is to replace the constraint i + j > nhy the constraint i + j > n — 1. 
That was essentially the only difference between two Propositions |2.1| and |2.2| Replacing n by n — 1 leads to changes 
in the final formula for the required number of Banyan planes. Deriving the formulas is relatively straightforward but 
also takes takes some (straightforward) calculus effort and thus we do so here. The overall outline of the analysis, 
however, is identical and we can reuse much of the analysis for the non crosstalk-free case. 

5.1 Setting up the linear program and its dual 



We use identical notations as in the previous section. The following lemma is the crosstalk-free analog of Lemma 4.3 



Lemma 5.1. For each input u G ^ and each w G [d"^* — 1] such that i{u) + j{w) > n — 1, define a variable x^.w 
Also, for each input u G ^ and each output v G Wq — B such that i (u) + j (v) > n — 1, define a variable Xu, v Then, 
the number of Banyan planes blocking (a, B) is upperbounded by the optimal value of the linear program (j^, whose 
dual is (|5]l. 

We next derive some quick consequences of the formulation. 
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Corollary 5.2 (Theorem III.l in J35|). Let r = [log^; /J. Suppose the 1 x m-SE stage of the logj^{N, 0, m) network 
does not have fanout capability, then when f < d"^^{d — 1) the network is crosstalk-free f-cast strictly non-blocking 
if 



m > 



When f > d/^ [d ~ 1) the network is f-cast strictly nonblocking ifm > d" — d" {d ~ 1). 

Proof. Routing using the window algorithm with window size t = n is the same as routing arbitrarily in the network 
when the 1 x m-SE stage cannot fanout. Thus any sufficient condition for the window algorithm to work is an strictly 
nonblocking condition. Note that when t = n the dual constraints (DC-1) do not exist. Consider a solution to the dual 
LP as follows. 

When / > d''-'^[d - 1), consider two cases. If fc > _ 2), set 5^ = 1 for all v e [fjZl Bj and all other 

variables to be 0. Then, the dual objective value is 



3=0 



= d"" - k < d'' - d''-^{d - I) - 1. 



Thus, in this case d" — d'^^'^{d — 1) Banyan planes is sufficient. Next, suppose k < d^'^~^{d — 1), in which case 
kd < d^. Set 7u = 1 for all u with i(u) > 1, 5v = 1 for all v G B^-i, and all other variables to be 0. The solution is 
dual-feasible with dual objective value 



u:i(u)>l 



v6B„ 



< 



1=1 

n-l 

1=1 

jn— 1 



\Bn-l\ 



) +min{d" - fc,fc(d- 1)} 



d"-^ - l + k{d~l) 
< d'^-^ +d"-^{d~l)^ -1 

and thus again d" — d^~'^{d — 1) Banyan planes is sufficient. 

Next, consider the case when / < c?"~^(d— 1). In this case r < n — 2. Letp = . Then, 1 < p < [^^] • 

Furthermore, 

kdP < fdP < d^'+^dr^^l = d\'^^ < d". 

Set 7u = 1 for all u with i{u.) > p, 6v — 1 for all v e Uj=n-p Bj and all other variables to be 0. The solution is dual 
feasible because, for any pair (u, v) for which z(u) + j(v) > n — 1, we m ust either have i(u) > p or j(v) > n — p 
(which is the same as saying v e Uj=n-p Bj). Recalling Proposition 
the dual objective value is 



4.2 



and the fact that kd^ < shown above, 



E 7u+ E 

u:l(u)>p ve\J'-=^-pBj 



< 



< 



Ei^»i+ U 

i—p j=f^—p 

^(d""* - d"-*~i) + min{d" - fc, k{dP - 1)} 

i—p 

d"-p - 1 + k{dp - 1) 
d"-p + f{dp - 1) - 1 



Hence, in this case d" p + f{dP — 1) is a sufficient number of Banyan planes. 



□ 
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5.2 Specifying a family of dual-feasible solutions 

The family of dual-feasible solution is specified with two integral parameters where 0<p<n — t — 1 and n — t < 
q < n. The parameter p is used to set the variables Cu, and /3u,u,, and the parameter q is used to set the variables 
7u and 6^. As we set the variables, we will also verify the feasibility of the constraints (DC-1) and (DC-2), and the 
contributions of those variables to the final objective value. 

• Specifying the eu variables. Set = 1 if i(u) > n — p and otherwise. The contribution of the to the 
objective is 

n—l n—1 

E/^"= E / E 1= E f\Ai\ = f{d'>-i). 

u i=n-p u:i(u)=i i=n-p 

• Specifying tlie and (S^.w variables. Next, we define the and fS^.w The constraints (DC-1) with i(u) > 
n—p are already satisfied by the Cu, hence we only need to set the ayj and /3u,w to satisfy (DC-1) when j {w) > p. 
(If j{w) < p — 1, then for the constraint to exist we must have i{u) > n — 1 — j{iu) > n — p.) The variables 
ayj and (3u,w are set differently based on three cases as follows. 

Case 1. If i > \^~\ , then set /3u,u> = 1 whenever^ < j{w) < n — t — 1 and n — l — j{w) < i{u) < n — p—1, 
and set all other and f3u,w to be 0. It can be verified straightforwardly that all constraints (DC-1) are satisfied. 
Thus, the contributions of the and I3u,w to the dual objective value is 

n — t— 1 n—p—1 

E /3u,- = E (^""'"*-^""'"*"') E i^^i 

j=p i=n—l—j 

p<.j{w)<n—t—l 
n—l—j(w)<i(u)<n—p—l 

The second equality follows from the fact that the number of windows for which j{w) = j is precisely 

^n-j-t _ ^n-j-t-l_ 

Case 2. Whenp + 1 <t< [|] — 1, set /^u.u; = 1 whenverp < j{w) <t — l and n — l—i{w) < i < n—p — 1, 
and = 1 for t < j{w) < n — t — 1, and all other and Pu,w to be 0. All constraints (DC-1) are thus 
satisfied. The ayj's and /3u,w's contributions to the objective is 



u,w 



t<j(w)<n-t P<j{w)<t-1 

n—l—j{w)<i{u)<.n—p—l 

n-t-1 t-1 

d\(r-^-' - cC'-^-'-^) + ^(d"-^-* - - (F) 

j=t j=p 



Case 3. When t <p (which is < n — t — 1), set 



a = P< j{w) < n-t-1 
^ I otherwise 
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and all the /3u,tu to be zero. Again, the feasibility of the constraints (DC-1) is easy to verify. The contribution to 
the objective value is 



p<~j{w)<Cn—t 



= d'-\{w:jiw)=j}\ 

ri-t-l 



Specifying the 7u and variables. When q = n — t, set ~ I for all v e Uj=n-t 7" = 0. The 

dual-objective contribution in this case is 



I U 

j=n-t 



When n—t+1 < q < n, define = I for all v e [J]=q Bj, 7u = 1 for all u such that n—q < i(u) < n~p— 1, 
and all other and 7u are set to be zero. From Proposition 4.2 the total contribution of the 7u and to the 
dual-objective is 



E 7. 

U 

-g'<z(u)<n— p— 1 



n—p— 1 

= 5] |{u:^(u)=^}| 

i— n— (J 



n—p— 1 

< ^ |Ai|+min{d*-fc,A:(d"-9-l)} 

= rf''-dP + min{(i*-/c,fc(d"-«-l)}. 

The feasibility of all the constraints (DC-2) is easy to verify. 

Define the "cost" g{k,p, q) to be the total contribution of all variables to the dual-objective value. We summarize 
the values of g{k,p, q) in Figure|7] We just proved the following. 

Theorem 5.3. The above family of solutions is feasible for the dual LP ^ with objective value equal to g{k,p, q). 
(Recall that, in this problem we are working on the dual constraints for which i(u) + j(v) > n— 1 andi{u) +j(w) > 
n — I.) Consequently, for the network log^(A^, 0, m) to be crosstalk-free f-cast wide-sense nonblocking under the 
window algorithm with window size d*, it is sufficient that 



m > 1 + max ming(fc,p, g). 

l<fc<min(/,d*) P,q 



(9) 



5.3 Selecting the best dual-feasible solution 



The proof of the following technical lemma is similar to that of Lemma 4.7 and thus we omit the proof. 
Lemma 5.4. Let d, n, k be positive integers, and x ~ \\og^ k\. Then, the fallowing function 

hik) = dl^\ + k (d-L^J - 1) (10) 

is non-decreasing in k. 
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The objective value g(k, p, q) 



Fort > andg = n - t, 

g(fc,p, q) = f{dP -l) + {n-t- - d""*) - d""* + dP + - k. 

Fort > [5] andg > n-t, 

g(fc, p, g) = /{d" - 1) + (n - t - p){(i"-*+l - d""*) - d""* + d" + mm{d' - fc, fc(d"-'' - 1)}. 
Forp + 1 < t < [|] - 1 and (J = n - t 

g{k,p, q) = f(dP - 1) + (t - p)(d"-'+l - d"-') + d'^+J'-^* - k. 
Forp + 1 < t < [|] - 1 and<j > n - t 

g{k,p, q) = f(dP - 1) + (t - p)(d"-'+i - d"-') + d"+f-2t -d* +d'' -dP + mm{d* - A;, A-^d"-" - 1)}. 
For t < p and q = n — t. 



For t < p and q > n — t, 



g{k,p,q) = f(dP-l) + d''-P~k. 
g{k,p,q) = f{dP - l) + d"-P -d^ +d'' - dP + mm{d* - k,k{d"-'' - 1)}. 

Figure 7: The dual objective value of the family of dual-feasible solutions. 

The upper-bound G(t, f) 



To shorten the notations, let r = [log^j /J . 

CWn-t[(„ -t){d-l)-l]+d^ - d2t-" + l(d - 1) 

/(d"-'-'' - 1) + rd"-\d - 1) - d"-* + dl-^^^^J 

d"-*[(n-t)(d-l)-l]+di^^'^-l +/(d"~L^2^J - 1) 
/{d""'-'' - 1) + d"-'[r{d - 1) - 1] + d« - d2*-"-2{d - 1) 



■■+"+1 I 
2 J - ; 



G(t,/) = { 



d"-*[(n - - 1) - 1] +d* 

/(dl^^^T^l - 1) + d""^^^^! - 1 

/(d*-l - 1) + d"-*-i(d2 - d + 1) - 1 

/(d"-'-'' - 1) + (2t - n - r)(d - l)d"-* + d2n-3t-r- . 

t(d- l)d"-* + d"-2i _ 1 



t>n/2,r> max{2t - n - 2, n - i + 1} 

1) t > n/2,r < mm{2t - n - 3,n - t} 

t > n/2, n-t + l<r<2t~n-3 
t > n/2, 2t-n-2<r<n-t 
t = n/2 

t < n/2, r < n - 2i and / < d"-'^\d - 1) 
t <n/2,r <n-2t,f > d"-^\d-l) 
t < n/2, n-2t + l<r<n-t 
t < n/2, n — t < r 



Figure 8: We show in Theorem 5.5 that G{t, f) > nmxk minp.g g{k,p, q) 



Theorem 5.5. The \ogj^{N, 0, to) network is crosstalk-free nonblocking under the window algorithm with window size 
d* ifm > 1 + G{t, /) where G{t, /) is defined in Figure^ 

Proof. We specify for each fc how to set the values andq^;. The straightforward task of verifying that g(fc,pi:, gfe) < 
G{t, /) is mostly omitted due to space constraint. 

Suppose t > n/2, i.e. 2t > n + 1. We consider four cases as follows. In all cases, define an integral variable 

• Case 1. r > max{2i — n — 2,n — t + 1}. 

If fc > d2*-"-2(^_ _^ then pick pfe and = n - On the other hand, if fc < d'^^-"~'^ {d - 1) then 



we pick pfc = and qk — |^ ^+^+i J > _ ^. Note that fed" < d*. Thus, recall from Lemma 5.4 that the 
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function h{k) defined in ( [TO] l is non-decreasing in k, we have 



g{k,pk,qk) = 


(n- 




- d"-* 


+ dL^J+min{d*-fc,fc(d'-L^J 




(n- 




- d"-* 


+ dL^J+fc(d"-L^J) 




(n - 




- d"-* 


+ /i(fc) 


< 


(n - 




- d"-* 


+ /l(d2*-"-2(^_ 






'[{n-t){d-l)-l] 


+ d*- 




Case 2. r < mm{2t 


— n 


— 3, n — i}. set = 


n — t - 


-r < t 1 and^fc - ["^+2"^^] > n t. 



• Case 3. n — t + 1 < r < 2t — n — 3. This is case we set Pk = and qk = |^ ^+^^+i J > n — t. 

• Case 4. 2i — n — 2 < r < n — In this case we set = n — t — r and consider two sub-cases as in case 1 : 

k > d2*-"-2(d - 1) + 1 or fc < d2*-"-2(rf - 1). 

When t — n/2, we simply set qk — n — t and pk = 0. Then, 

g{k,pk,qk) = d"-*[(n - t)(d - 1) - 1] + d*. 
Finally, suppose t < n/2. We will always pickpfe = q — tin this situation. Also consider four cases: 

• Case l.r <n-2t and / < d''-^\d - 1), set pk = [^^=f=^] > t. 

• Case 2. r < n - 2t and / > d''-'^\d - 1), set Pk = t - 1. 

• Case 3. n — 2t + 1 < r < n — t, set Pk = n — t — r. 

• Case 4. n — t < r, set — 0. 

□ 



Corollary 5.6 (Theorems 1 in |23|). The d-ary multi-log network \og^{N ,Q,m.) is crosstalk-free wide-sense non- 
blocking with respect to the window algorithm with window size d* ;/ 



m > 



'd"-2t +i(f"-t(rf_ 1) t<n/2 

d"-*[(n-t)(d- 1) - 1] +d* + 1 t = n/2 
d"-*[(n-i)(d- 1) - 1] + 

^d*-d2*-"-2(d-l) + l t>n/2. 
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